109 research outputs found
Measuring Tie Strength in Implicit Social Networks
Given a set of people and a set of events they attend, we address the problem
of measuring connectedness or tie strength between each pair of persons given
that attendance at mutual events gives an implicit social network between
people. We take an axiomatic approach to this problem. Starting from a list of
axioms that a measure of tie strength must satisfy, we characterize functions
that satisfy all the axioms and show that there is a range of measures that
satisfy this characterization. A measure of tie strength induces a ranking on
the edges (and on the set of neighbors for every person). We show that for
applications where the ranking, and not the absolute value of the tie strength,
is the important thing about the measure, the axioms are equivalent to a
natural partial order. Also, to settle on a particular measure, we must make a
non-obvious decision about extending this partial order to a total order, and
that this decision is best left to particular applications. We classify
measures found in prior literature according to the axioms that they satisfy.
In our experiments, we measure tie strength and the coverage of our axioms in
several datasets. Also, for each dataset, we bound the maximum Kendall's Tau
divergence (which measures the number of pairwise disagreements between two
lists) between all measures that satisfy the axioms using the partial order.
This informs us if particular datasets are well behaved where we do not have to
worry about which measure to choose, or we have to be careful about the exact
choice of measure we make.Comment: 10 page
L2P: An Algorithm for Estimating Heavy-tailed Outcomes
Many real-world prediction tasks have outcome variables that have
characteristic heavy-tail distributions. Examples include copies of books sold,
auction prices of art pieces, demand for commodities in warehouses, etc. By
learning heavy-tailed distributions, "big and rare" instances (e.g., the
best-sellers) will have accurate predictions. Most existing approaches are not
dedicated to learning heavy-tailed distribution; thus, they heavily
under-predict such instances. To tackle this problem, we introduce Learning to
Place (L2P), which exploits the pairwise relationships between instances for
learning. In its training phase, L2P learns a pairwise preference classifier:
is instance A > instance B? In its placing phase, L2P obtains a prediction by
placing the new instance among the known instances. Based on its placement, the
new instance is then assigned a value for its outcome variable. Experiments on
real data show that L2P outperforms competing approaches in terms of accuracy
and ability to reproduce heavy-tailed outcome distribution. In addition, L2P
provides an interpretable model by placing each predicted instance in relation
to its comparable neighbors. Interpretable models are highly desirable when
lives and treasure are at stake.Comment: 9 pages, 6 figures, 2 tables Nature of changes from previous version:
1. Added complexity analysis in Section 2.2 2. Datasets change 3. Added
LambdaMART in the baseline methods, also a brief discussion on why LambdaMart
failed in our problem. 4. Figure update
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction
Link prediction is a crucial task in graph machine learning with diverse
applications. We explore the interplay between node attributes and graph
topology and demonstrate that incorporating pre-trained node attributes
improves the generalization power of link prediction models. Our proposed
method, UPNA (Unsupervised Pre-training of Node Attributes), solves the
inductive link prediction problem by learning a function that takes a pair of
node attributes and predicts the probability of an edge, as opposed to Graph
Neural Networks (GNN), which can be prone to topological shortcuts in graphs
with power-law degree distribution. In this manner, UPNA learns a significant
part of the latent graph generation mechanism since the learned function can be
used to add incoming nodes to a growing graph. By leveraging pre-trained node
attributes, we overcome observational bias and make meaningful predictions
about unobserved nodes, surpassing state-of-the-art performance (3X to 34X
improvement on benchmark datasets). UPNA can be applied to various pairwise
learning tasks and integrated with existing link prediction models to enhance
their generalizability and bolster graph generative models.Comment: 17 pages, 6 figure
- …